Skip to main content

About the Provider

OpenAI is the organization behind Whisper Large v3. OpenAI is a major AI research lab and platform provider that builds a wide range of generative models, including text, image, code, and audio models.

Model Quickstart

This section helps you quickly get started with the openai/whisper-large-v3 model on the Qubrid AI inferencing platform. To use this model, you need:
  • A valid Qubrid API key
  • Access to the Qubrid inference API
  • Basic knowledge of making API requests in your preferred language
Once authenticated with your API key, you can send inference requests to the openai/whisper-large-v3 model and receive responses based on your input prompts. Below are example placeholders showing how the model can be accessed using different programming environments.
You can choose the one that best fits your workflow.
import requests

url = "https://platform.qubrid.com/api/v1/qubridai/audio/transcribe"

headers = {
  "Authorization": "Bearer <QUBRID_API_KEY>"
}
files = {
  "file": open("audio.wav", "rb")
}
data = {
  "model": "openai/whisper-large-v3",
  "language": "en" 
}
response = requests.post(
  url,
  headers=headers,
  files=files,
  data=data
)
print(response.json())

Model Overview

Whisper Large V3 is a general-purpose speech recognition and speech translation model developed by OpenAI. It is designed for high-accuracy automatic speech recognition (ASR) across a wide range of languages, audio qualities, and recording conditions. The model is trained on more than 5 million hours of labeled and pseudo-labeled audio, enabling strong zero-shot performance across datasets and domains. Whisper Large V3 improves upon previous versions with better multilingual accuracy and enhanced audio representation.

Model at a Glance

FeatureDetails
Model IDopenai/whisper-large-v3
ProviderOpenAI
Model TypeSpeech-to-Text (ASR) & Speech Translation
ArchitectureEncoder-Decoder Transformer
Context Length30sec/chunk
Model Size1.55B params
Parameters8

Supported languages

CodeLanguageCodeLanguage
enEnglishesSpanish
frFrenchdeGerman
zhChinesejaJapanese
koKoreanruRussian
arArabichiHindi
ptPortugueseitItalian

When to use?

You should consider using Whisper Large V3 if:
  • Transcription accuracy is more important than speed
  • Your application requires support for many languages
  • You work with noisy, low-quality, or challenging audio
  • You need reliable speech recognition for long-form audio
  • Your workflow includes speech translation or language identification

Inference Parameters

Parameter NameTypeDefaultDescription
TaskselecttranscribeChoose whether to transcribe to the original language or translate to English.
LanguageselectenSelect the spoken language (auto-detect if unsure).
Temperaturenumber0Controls randomness of output — 0.0 means deterministic.
Initial PromptstringBusiness meeting conversationGuides the model to better understand the audio context.
Word TimestampsbooleantrueReturn per-word timestamps for the transcription.
VadFilterbooleantrueEnable for long pauses or background noise; disable for tightly trimmed clips to save compute.
Return SegmentsbooleantrueReturn transcription with time-segment metadata.
Output FormatselectjsonChoose the transcription output format.

Key Model Features

  • Uses 128 Mel frequency bins in the spectrogram input (previous versions used 80)
  • Trained on 1M hours of weakly labeled audio and 4M hours of pseudo-labeled audio
  • Shows 10–20% error reduction compared to Whisper Large V2 across many languages
  • Designed to handle noisy audio, varied recording conditions, and diverse accents

Supported Capabilities

  • Multilingual speech recognition
  • Speech translation
  • Language identification
  • Short-form and long-form transcription

Best Practices

  • Use this model when accuracy is the top priority
  • Process long audio using 30-second segments for optimal performance
  • Prefer sequential long-form transcription for maximum accuracy
  • Use chunked long-form transcription when faster processing is required
  • Rely on this model for challenging audio conditions

Summary

Whisper Large V3 is a high-accuracy speech recognition and translation model. It supports transcription across more than 99 languages. The model improves accuracy over previous versions with better audio representation. It is optimized for both short-form and long-form audio processing and is best suited for applications where transcription quality is critical.